We will use Eurostat indicator data for “Gender pay gap in unadjusted form” to explore the geographical and time trends for the gender pay gap in the EU and compare Portugal with some European Union (EU) countries

Objective

The objective is to look at the geographical and time trends in the data. We will answer the following questions:

  • What are the time trends for Portugal?
  • How does Portugal compare to other European countries?
  • Which countries have the largest and smallest pay gap in Europe over time?

Understanding the Data

Gender Pay Gap in Unadjusted Form

Unit of Measure: % of average gross hourly earnings of men.

The indicator measures the difference between average gross hourly earnings of male paid employees and of female paid employees as a percentage of average gross hourly earnings of male paid employees. The indicator has been defined as unadjusted, because it gives an overall picture of gender inequalities in terms of pay and measures a concept which is broader than the concept of equal pay for equal work. All employees working in firms with ten or more employees, without restrictions for age and hours worked, are included.

Taken from (https://ec.europa.eu/eurostat/databrowser/view/sdg_05_20/)

Data Source

The Eurostat gender pay gap data is from the “Structure of Earnings Survey (SES)” and is based on data reported by the countries.

The data is Copyrighted by Eurostat Copyright/Licence Policy is applicable.

Further Information

Please see (https://ec.europa.eu/eurostat/cache/metadata/en/sdg_05_20_esmsip2.htm) for further information about the data.

Loading Libraries

library(eurostat)
library(data.table)
library(magrittr)
library(knitr)
library(kableExtra)
library(ggplot2)
library(ggrepel)
library(gganimate)
library(gifski)
library(gghighlight)

Data processing

Download data from Eurostat

Selecting all the available pay gap data (indicator code sdg_05_20) from Eurostat.

# Get all EU data in one go and keep the country code (`geo_code`)
pgapEU <- get_eurostat(id="sdg_05_20", time_format = "num") %>% 
  label_eurostat(., code = "geo")

# We will work with the `data.table` package.
setDT(pgapEU)

# Minimum and maximum available year
minYear <- min(pgapEU$time, na.rm = TRUE)
maxYear <- max(pgapEU$time, na.rm = TRUE)

#_# To do 
# Best to get the map data first so we can merge it directly to the indicator data.
# mapEU <- get_eurostat_geospatial(nuts_level = 0)
# setDT(mapEU)

# Update the `geo` variable to make it print and plot friendly.
pgap <- pgapEU[, geo_orig := geo] %>%
  .[, geo := fifelse(geo_code == "DE", "Germany", geo)] %>%
  .[, geo := fifelse(grepl("^EU|^EA", geo_code), gsub("_", " ", geo_code), geo)] %>% 
# Creating a variable geo_label to label lines just once.
  .[, geo_label_right := ifelse(time == max(time), geo_code, ""), .(geo_code)] %>% 
  .[, geo_label_left := ifelse(time == min(time), geo_code, ""), .(geo_code)] %>% 
  .[, c("nace_r2"):=NULL] %>% 
# Adding a factor time variable (with levels in reverse)
  .[, timeF := factor(time, levels = c(maxYear:minYear))]
 #_# To do
 # mutate (cat = cut_to_classes (values, n = 4, decimals = 1))

We will highlight some countries to compare Portugal with.

# Define a list of countries of interes that will be used later.
ct <-  c("AT", "BE", "DE", "ES", "FR", "NL", "IT", "PT", "GR", "EU27_2020")
PTEU <-  c("PT", "EU27_2020")

Data summaries

Some data summaries to understand the data that we have.

# Distinct years
pgap[, c(time)] %>% unique(.) %>% sort(.)
##  [1] 2002 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
# Information by country
pgap[, .(countryData = sprintf("%2s %15s: %4.0f-%4.0f (%2.0f)", geo_code, geo, min(time), max(time), .N)),
     .(geo_code, geo)] %>% 
    .[, c(countryData)] %>% 
  unique(.) %>%  sort(.)
##  [1] "AT         Austria: 2006-2018 (13)"       
##  [2] "BE         Belgium: 2006-2018 (13)"       
##  [3] "BG        Bulgaria: 2002-2018 (14)"       
##  [4] "CH     Switzerland: 2006-2017 (11)"       
##  [5] "CY          Cyprus: 2002-2018 (14)"       
##  [6] "CZ         Czechia: 2002-2018 (14)"       
##  [7] "DE         Germany: 2006-2018 (13)"       
##  [8] "DK         Denmark: 2006-2018 (13)"       
##  [9] "EA19            EA19: 2010-2018 ( 9)"     
## [10] "EE         Estonia: 2006-2018 (13)"       
## [11] "EL          Greece: 2002-2018 ( 7)"       
## [12] "ES           Spain: 2002-2018 (14)"       
## [13] "EU27_2007       EU27 2007: 2006-2006 ( 1)"
## [14] "EU27_2020       EU27 2020: 2010-2018 ( 9)"
## [15] "EU28            EU28: 2010-2018 ( 9)"     
## [16] "FI         Finland: 2006-2018 (13)"       
## [17] "FR          France: 2006-2018 (13)"       
## [18] "HR         Croatia: 2010-2018 ( 6)"       
## [19] "HU         Hungary: 2002-2018 (14)"       
## [20] "IE         Ireland: 2002-2017 (13)"       
## [21] "IS         Iceland: 2007-2018 (12)"       
## [22] "IT           Italy: 2006-2018 (13)"       
## [23] "LT       Lithuania: 2002-2018 (14)"       
## [24] "LU      Luxembourg: 2006-2018 (13)"       
## [25] "LV          Latvia: 2006-2018 (13)"       
## [26] "ME      Montenegro: 2014-2014 ( 1)"       
## [27] "MK North Macedonia: 2014-2014 ( 1)"       
## [28] "MT           Malta: 2006-2018 (13)"       
## [29] "NL     Netherlands: 2002-2018 (14)"       
## [30] "NO          Norway: 2006-2018 (13)"       
## [31] "PL          Poland: 2002-2018 (14)"       
## [32] "PT        Portugal: 2006-2018 (13)"       
## [33] "RO         Romania: 2002-2018 (14)"       
## [34] "RS          Serbia: 2014-2018 ( 2)"       
## [35] "SE          Sweden: 2006-2018 (13)"       
## [36] "SI        Slovenia: 2002-2018 (14)"       
## [37] "SK        Slovakia: 2002-2018 (14)"       
## [38] "TR          Turkey: 2006-2014 ( 2)"       
## [39] "UK  United Kingdom: 2002-2018 (14)"

Evolution of Gender Pay Gap in EU over Time

Line graph

pg01 <- pgap[geo_code %chin% ct] %>% 
ggplot(aes(x = time, y= values, color = geo, label = geo)) + 
  geom_line (alpha = .8, size = 1) +
  geom_point() +
  scale_y_continuous(breaks = seq(0, 30, 5), limits = c(0,30)) +
  scale_x_continuous(breaks = seq(2002, 2018, 2), limits = c(2002, 2019)) +
  theme(legend.position = "none") + 
  geom_text_repel(aes(label=geo_label_right),
                  direction = "y",
                  nudge_x = .85,
                  segement.alpha = 0.2,
                  segment.color = "grey80") +
  labs(title = "Gender Pay Gap Over Time",
       x= "Year", 
       y= "% Average Difference",
       caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.")

pg01
<<<<<<< HEAD

=======

>>>>>>> 3b8e375e495d8bb79053afdba414e0491eb8b616

Animated Line graph

pg01 +
    geom_text_repel(aes(label=geo_code),
                  direction = "y",
                  nudge_x = .75,
                  segement.alpha = 0.7,
                  segment.colour = "grey80") +
<<<<<<< HEAD
  transition_reveal(time) +
  geom_point()

======= transition_reveal(time)

>>>>>>> 3b8e375e495d8bb79053afdba414e0491eb8b616

Portugal vs. European Union, 2010-2018

Portugal has no available data until 2006 and the EU only has available data from 2010 onwards.

pg02 <- pgap[geo_code %chin% PTEU] %>% 
  ggplot(aes(x = time, y= values, color = geo, label = geo)) + 
  geom_line(data = pgap, aes(x = time, y= values, group = geo), colour ="grey70", alpha = .5) +
  geom_line (alpha = .8, size = 1) +
  scale_y_continuous(breaks = seq(0, 30, 5), limits = c(0,30)) +
  scale_x_continuous(breaks = seq(2002, 2018, 2), limits = c(2002, 2019)) +
  theme(legend.position = "none") + 
  geom_text_repel(aes(label=geo_label_right),
                  direction = "y",
                  nudge_x = .45,
                  segement.alpha = 0.7) +
  labs(title = "Gender Pay Gap, 2003-2018",
       x= "Year", 
       y= "% Average Difference",
       caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.")
pg02

Gender Pay Gap - Bar Charts

pgap[time %in% c(2010, 2014, 2018) & geo_code %chin% ct] %>% 
ggplot (aes(x = reorder(geo_code, values), y = values, fill = timeF)) + 
  geom_bar(stat = "identity", alpha=.8, width=.8, position = "dodge") +  
  # facet_wrap(~time, scales = "free_x") +
  # gghighlight(geo_code == "PT") +
  labs(title = "Gender Pay Gap Over Time",
       x = "", 
       y = "% Average Difference",
       fill = "",
       caption = "Unadjusted % difference between average gross hourly earnings of male paid employees and of females.") +
<<<<<<< HEAD
  coord_flip()

======= scale_fill_discrete(guide=guide_legend(reverse=T)) + theme_minimal() + theme (axis.text.x = element_text (size = 6), legend.position = "bottom") + coord_flip()

>>>>>>> 3b8e375e495d8bb79053afdba414e0491eb8b616

Gender Pay Gap - Bar Charts

TO DO